In [1]:
%matplotlib inline
from ggplot import *

Layers

Layers are one of the most powerful aspects of ggplot. The idea is to think of your plot as containing different components, or layers, which when combined together make up the entire visual.

Take the following example.


In [2]:
ggplot(diamonds, aes(x='carat', y='price')) + geom_point() + ggtitle("Carat vs. Price")


Out[2]:
<ggplot: (284424909)>

The plot above shows a scatterplot comparing a diamon's carat and the price of the diamond. The plot is composed of 3 layers:

  • Base layer ggplot(diamonds, aes(x='carat', y='price')) -- This defines the dataset that's going to be plotted and the aesthetics (or instructions) to be used for defining the x and y axes.
  • Geom layer geom_point() -- This layer tells ggplot to render a scatter plot using the aesthetics and data defined in the base layer.
  • Labels layer ggtitle("Carat vs. Price") -- This layer applies a title to the plot. There are lots of other labels and customizations you can do to your plots (xlab, ylab, etc.).

You can continue to add more layers to your plot as there are more things you'd like to see. For instance, if I wanted to customize the x and y axis labels, I could do so by add 2 addition layers using xlab and ylab.


In [3]:
ggplot(diamonds, aes(x='carat', y='price')) + \
    geom_point() + \
    ggtitle("Carat vs. Price") + \
    xlab("         Carat\n(1 carat = 200 mg)") + \
    ylab("     Price\n(2008 USD)")


Out[3]:
<ggplot: (285241465)>

In addition to adding labels you can also add additional "geoms", or plot types. For instance, let's add a linear trend-line to our plot using stat_smooth.


In [4]:
ggplot(diamonds, aes(x='carat', y='price')) + \
    geom_point() + \
    stat_smooth(method='lm') + \
    ggtitle("Carat vs. Price") + \
    xlab("         Carat\n(1 carat = 200 mg)") + \
    ylab("     Price\n(2008 USD)")


Out[4]:
<ggplot: (285337749)>

It looks like there are some outlying points in our plot. Let's filter out some of those rows in our dataset by using xlim and ylim. By adding these layers, it'll cap the x and y axes with whatever values we tell it to.


In [5]:
ggplot(diamonds, aes(x='carat', y='price')) + \
    geom_point() + \
    stat_smooth(method='lm') + \
    ggtitle("Carat vs. Price") + \
    xlab("         Carat\n(1 carat = 200 mg)") + \
    ylab("     Price\n(2008 USD)") + \
    xlim(0, 3) + \
    ylim(0, 20000)


Out[5]:
<ggplot: (289352145)>

Instead of building your ggplots with one big line of code, you can break them up into individual lines of code. To do this, use the + or += operators to gradually tack on layers to your plot.


In [6]:
p = ggplot(aes(x='mpg'), data=mtcars)
p += geom_histogram()
p += xlab("Miles per Gallon")
p += ylab("# of Cars")
p


Out[6]:
<ggplot: (291396381)>

In [ ]: